# Generalising Nonresponse Prediction in Panel Surveys with Machine Learning

## Abstract

While predictive modeling for unit nonresponse in panel surveys has been explored in various
contexts, it is still under-researched how practitioners can best adopt these techniques. Cur-
rently, practitioners need to wait until they accumulate enough data in their panel to train and
evaluate their own modeling options. This paper presents a novel “cross-training” technique
in which we show that the indicators of nonresponse are so ubiquitous across studies that it
is viable to train a model on one panel study and apply it to a different one. The practical
benefit of this approach is that newly commencing panels can potentially make better non-
response predictions in the early waves because these pre-trained models make use of more
data. We demonstrate this technique with five panel surveys which encompass a variety of
survey designs: the Socio-Economic Panel (SOEP), the German Internet Panel (GIP),
the GESIS Panel, the Mannheim Corona Study (MCS), and the Family Demographic Panel
(FREDA). We demonstrate that nonresponse history and demographics, paired with tree-based
modeling methods, make highly accurate and generalizable predictions across studies, despite
differences in panel design. We show how cross-training can effectively predict nonresponse
in early panel waves where attrition is typically highest.


## Replication

1) Acquire the following raw datasets described in manuscript appendix. Place each .zip file in the appropriate sub directory under `data/sensitive/`. Unzip each .zip file. 
    - Liebig, S., Goebel, J., Grabka, M., Schröder, C., Zinn, S.,Bartels, C., . . . Deutsches Institut Für Wirtschafts- forschung (DIW Berlin). (2022). Socio-economic panel, data from 1984-2020, (SOEP-core, v37, EU edition) sozio-oekonomisches panel, daten der jahre 1984-2020 (SOEP-core, v37, EU edition). Medium: CSV,SPSS,Stata (bilingual),SPSS,RData Version Number: v37 Type: dataset. doi:10.5684/SOEP.CORE.V37EU
    - Blom, A. G., Gonzalez Ocanto, M., Krieger, U., Rettig, T., Ungefucht, M., & SFB 884 ´Political Economy of Re-forms´, U. M. (2022). German internet panel, wave 58 (march 2022). GESIS, Cologne. ZA7878 Data file Version 1.0.0, https://doi.org/10.4232/1.14054. doi:10.4232/1.14054
    - GESIS. (2023). GESIS Panel - Standard Edition. Pub-lished: GESIS, Cologne. ZA5665 Data file Ver-sion 44.0.0, https://doi.org/10.4232/1.13931 DOI:10.4232/1.13931. Retrieved from Published: %20GESIS,%20Cologne.%20ZA5665%20Data%20file%20Version%2044.0.0,%20https://doi.org/10.4232/1.13931%20DOI:%2010.4232/1.13931
    - Blom, A. G., Cornesse, C., Friedel, S., Krieger, U., Fikel, M., Rettig, T., . . . SFB 884 ´Political Economy of Re-forms´, U. M. (2021). Mannheim corona study. GESIS Data Archive, Cologne. ZA7745 Data file Version 1.0.0, https://doi.org/10.4232/1.13700. doi:10.4232/1.13700
    - Bujard, M., Gummer, T., Hank, K., Neyer, F. J., Pollak, R., Schneider, N. F., . . . Weih, U. (2023). Freda – the german family demography panel study (study no.za7777; data file version 2.0.0). Accessed: 2024-12-31. GESIS. Retrieved from http://dx.doi.org/10.4232/1.14065 

2) For the FREDA panel, currently, data on AAPOR nonresponse codes for each wave in release za7777.v2.0.0 are only available on special request. 
You will need to contact GESIS and ask for this study's data as a .dta file with variables as follows. Save this .dta file as `data/sensitive/FREDA/20231208_FReDA_W1_nonresponse_reasons.dta`.
    - IDsuf	
    - id	
    - Ausfallgrund_W1R	
    - erg_last_contact_W1R	
    - erg_last_contact_W1A	
    - Ausfallgrund_W1A	
    - erg_last_contact_W1B	
    - Ausfallgrund_W1B
3) Create a virtual environment with python 3.10.7 environment. 
4) In the environment, run `pip install -r requirements.txt`
5) Run notebooks in order
    - 01a_ETL_SOEP
    - 01b_ETL_FREDA
    - 01c_ETL_GESIS
    - 01d_ETL_GIP
    - 01e_ETL_MCS
    - 02_Data_Inspection
    - 03_Model_Comparison
    - 04_Cross_Modeling
    - 05_Analysis_Model_Comparison
    - 06_Analysis_Cross_Modeling